[SPARK-57681][SQL] Support dynamic table options for UPDATE#56792
[SPARK-57681][SQL] Support dynamic table options for UPDATE#56792anuragmantri wants to merge 1 commit into
Conversation
|
@szehon-ho, tagging you since you made the INSERT counterpart :) |
| // don't rewrite as the table supports truncation | ||
| d | ||
|
|
||
| case r @ ExtractV2Table(t: SupportsRowLevelOperations) => |
There was a problem hiding this comment.
Please consider this concern: silent option-drop on non-rewrite delete paths: for a table implementing both SupportsRowLevelOperations and SupportsDeleteV2 (e.g. Iceberg), a DELETE … WITH(...) WHERE can take the metadata-only/deleteWhere path (OptimizeMetadataOnlyDeleteFromTable / DataSourceV2Strategy), which has no options parameter; so the user's options are silently ignored. Same for the TruncatableTable truncate path. This would be a new user-visible "silently ignored" surprise, and no test currently covers this path.
There was a problem hiding this comment.
Great catch. I looked at those paths. The options are indeed not being passed all the way. Fixing this requires DSv2 API changes something like this with default methods.
// SupportsDeleteV2
default void deleteWhere(Predicate[] predicates, CaseInsensitiveStringMap options) {
deleteWhere(predicates); // default ignores options → fully back-compatible
}
// TruncatableTable
default boolean truncateTable(CaseInsensitiveStringMap options) {
return truncateTable();
}I can make this change in this same PR or can do a follow up if DSv2 changes need separate PRs. Let me know what you think is the best way forward.
| | UPDATE identifierReference tableAlias setClause whereClause? #updateTable | ||
| | DELETE FROM identifierReference optionsClause? tableAlias whereClause? #deleteFromTable | ||
| | UPDATE identifierReference optionsClause? tableAlias setClause whereClause? #updateTable | ||
| | MERGE (WITH SCHEMA EVOLUTION)? INTO target=identifierReference targetAlias=tableAlias |
There was a problem hiding this comment.
So this PR makes INSERT/DELETE/UPDATE accept WITH(...), but how about MERGE?
There was a problem hiding this comment.
I intentionally did not add MERGE as the semantics are a bit different since MERGE statements have source and target relations, I'm still thinking about how options would look like there. I can file a separate JIRA and PR for MERGE later.
| stop = 56)) | ||
| } | ||
|
|
||
| test("delete from table: with options") { |
There was a problem hiding this comment.
Please make sure to either update the SQL reference docs, or file a followup Jira ticket to do so later.
There was a problem hiding this comment.
For sure. I took a look and looks like INSERT docs are also not updated. I will create a follow up JIRA to update all options related DMLS at once.
|
@dongjoon-hyun - Could you please review this PR since you had reviewed its precedent (SPARK-49098) as well? Thank you! |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Thank you for pinging me, @anuragmantri .
In general, given the code contribution, the claim of this PR title is too broad. When you make a PR, please revise the PR title by narrowing down as much as possible to the exact contribution scope.
We don't need to add MERGE here. Instead, I'd like to recommend to proceed DELETE FROM syntax in another JIRA issue after removing this PR because it could be misleading if we ignore options. For UPDATE syntax, IIUC, there is no silent ignoring code path.
Lastly, follow-up have a special meaning in the community. So, please don't say follow-up of something especially when the Fix Version is different.
If you narrow down the scope according to the above, I believe we can get a higher chance of merge to Apache Spark 4.3.0, @anuragmantri .
398cadc to
50072c3
Compare
|
Thanks for the valuable feedback @dongjoon-hyun. As you suggested, I have narrowed the scope of this JIRA and the PR to only This is ready for another round. Thanks! |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
What happens for the following queries?
UPDATE t WITH (...) ...
WHERE id IN (SELECT ... FROM t)
We don't consider these new options for analyzer relation caches, do we?
Given that, it seems that there are more corner cases with analyzer relation cache. Could you add more test coverage and verity them? Typically, Subquery and CTE are the corner cases we need to consider carefully, @anuragmantri .
| | fromClause multiInsertQueryBody+ #multiInsertQuery | ||
| | DELETE FROM identifierReference tableAlias whereClause? #deleteFromTable | ||
| | UPDATE identifierReference tableAlias setClause whereClause? #updateTable | ||
| | UPDATE identifierReference optionsClause? tableAlias setClause whereClause? #updateTable |
There was a problem hiding this comment.
Although this seems to follow read path, this is inconsistent with our existing INSERT syntax.
UPDATE should follow INSERT style.
| stop = 70)) | ||
| } | ||
|
|
||
| test("update table: with options") { |
There was a problem hiding this comment.
Please use test prefix when it's possible.
- test("update table: with options") {
+ test("SPARK-57681: update table: with options") {
|
I finished the second-round review, @anuragmantri .
|
What changes were proposed in this pull request?
This PR add the dynamic table options syntax to UPDATE statements.
Why are the changes needed?
This is a continuation to adding options syntax in DMLs SPARK-49098
Does this PR introduce any user-facing change?
Yes, it adds new SQL syntax. For example:
How was this patch tested?
Added tests to the following suites.
Was this patch authored or co-authored using generative AI tooling?
I used Claude Code (Claude Opus 4.8) to generate the code and tests and verified manually.